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Abstract 

In this paper we survey the computational time complexity of assorted simple stochastic game 
problems, and we give an overview of the best known algorithms associated with each problem. 



1 Introduction 



A simple stochastic game G = (V, E) is a directed graph whose vertices are partitioned into four disjoint 
sets V max , V m i n , V avg and Vgink- Depending on the set a vertex belongs to, it is called max, min, average 
and sink vertex, respectively. In addition, one of the vertices in V is given the property of being the start 
vertex. V S i n k contains exactly two vertices, called the 1-sink and the 0-sink. The 1-sink and the 0-sink 
have no children, while all other vertices have exactly two distinct children. Loop edges e = are 
allowed. In the rest of this paper we assume w.l.o.g. that V = {1, .. . ,n} where n — 1 is the 0-sink and n 
is the 1-sink. 
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Figure 1: A simple stochastic game with 8 vertices. Vertex 1 is the start vertex. The numbers in paren- 
theses denote the optimal vertex values. 

The game is played by two players, called the max player and the min player, who have diametrically 
opposed objectives. At the start of the game, a token is placed on the start vertex. In each round, the 
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token is moved from a vertex to one of its children obeying the following rule: whenever the token is 
positioned on a max vertex, the max player decides to which child the token is moved; whenever the token 
is positioned on a min vertex, the min player decides to which child the token is moved and whenever the 
token is positioned on an average vertex, the token is moved with probability 1/2 to one of its children. 
Average vertices hence model randomness in this kind of stochastic game. The game ends, when the 
token reaches a sink vertex. The max player wins the game, if the token reaches the 1-sink. The min 
player wins the game in all other cases - that is either when the token reaches the 0-sink or when he can 
force an infinite play where the token reaches neither sink vertex. 

The optimal value of a vertex is defined as the probability that the max player wins the game starting 
at that vertex, assuming both players employ optimal strategies (a strategy is optimal, if the probability 
of winning the game with it is greater or equal to that of any other strategy, regardless of the strategy 
chosen by the opponent). We will later see that both players of a simple stochastic game posses optimal 
strategies, albeit not unique ones. The value of the game is defined to be the optimal value of its start 
vertex, and the optimal value vector of the game is defined to be the vector whose components are the 
optimal vertex values of the game. 

The most intriguing question to be asked about a simple stochastic game is: what is its value? As we 
shall see later, the complexity of the associated function problem is polynomial-time equivalent to that 
of finding the optimal value vector of the game. The problem of computing the optimal value vector of a 
simple stochastic game has been studied extensively from an algorithmic point of view [2,5, 1 1, 12, 15, 16] 
(no polynomial time algorithm has been found), but the author is not aware of any previous efforts in 
studying its complexity. Therefore, we shall prove containment of the problem in FNP in a subsequent 
section of this paper. 

The question about a simple stochastic game's value is not only intriguing, it has also pratical relevance. 
This is because stochastic games are nowadays used as a formal tool in a variety of different application 
areas, including automated software verification and controler optimization, where the game's value con- 
stitutes the single most crucial information. Apart from this, there exist some other motivations behind 
the study of simple stochastic games. Most of these are related to the SSG-VALUE problem - given a 
simple stochastic game, is its value greater than 1/2? Though Condon [4] was able to show that the 
SSG-VALUE problem is contained in NP fl coNP, despite significant efforts [2, 8, 9] to obtain a hardness 
result for a specific complexity class, the problem's exact complexity status is unknown. Thus one of 
the motivations behind the study of simple stochastic games is the desire to find a complexity class for 
which SSG-VALUE is complete, so as to obtain a clue whether the problem is intractable^] or not. The 
present consensus is that, since contained in NPD coNP, SSG-VALUE is very likely not NP-complete 
and may allow for more efficient algorithms than the exponential ones currently known. Condon rein- 
forces this hypothesis by stating that SSG-VALUE constitutes one of the rare combinatorial problems to 
be contained in NP fl coNP, but for which containment also in P is an open question. 

A last motivation behind the study of simple stochastic games can be expressed as the "kill two birds 
with one stone" factor; many computational problems, such as the generalized linear complementarity 
problem ( GLCP) and the minimum stable circuit problem for min/max/avg-circuits (STABLE-CIRCUIT), 
were shown [8, 9] to be polynomial-time reducible to SSG-VALUE - hence more efficient algorithms for 
SSG-VALUE will also yield more efficient algorithms for those other problems. The rest of this paper is 

'We wifl refer to the computational time complexity of a problem as the problem's complexity. 
2 Intractable computational problems are those that feature exponential time or space complexities. 



2 



organized as follows: in section 2, we will restate the essential definitions for simple stochastic games 
as given in Condon's initial paper on the subject. In section 3, we will provide a more detailed view of 
simple stochastic games which will enable us to conduct our complexity survey in section 4. The paper 
concludes with a summary of the important points and an overview of open problems in section 5. 

2 Definitions 

2.1 Player Strategies 

Given a simple stochastic game, a strategy x for the min player (or min strategy) is a subset of the 
game's edges such that for each min vertex i with children j and k, either (i, j) E x or (i,k) E x applies. 
Substituting x with a and min with max in the above sentence, we obtain the analog definition for the 
max strategy a. Informally, a strategy denotes the player's choice to which child the token is to be moved 
whenever it is positioned on a vertex belonging to that player. The reason for defining player strategies 
like this will be explained in the next section of this paper. 

2.2 Reduced Games 

Given a simple stochastic game G = (V, E) and a strategy x to be employed by the min player, the reduced 
game G T is defined to be the sub-graph of G obtained by removing all edges from G that are not selected 
by the min strategy x, i. e. 

G x = (V, E x ) where E z = E \ { (i, j) E E : i E V min A (/, j) <£ x} 

G T can be regarded as the 1 -player equivalent of G, where it is certain that the min player employs x. In 
a similar manner, the reduced games G a and G t;0 are defined as 

G a = (V,E a ) where E G = E \ {(/, j) E E : i e V max A (i, j) £ a} 

G T;0 = (V, E Xja ) where E Tj0 = E \ { (i, j) E E : i E V min U V max A (i, j) <£ X U a} 

We observe that in the reduced game G T;0 , the strategies of both players are fixed to x and a and the 
winner is decided by a (more or less) random walk of the token on the graph of G x , - 

2.3 Vertex Values 

Given a reduced game G x ,a (or alternatively a simple stochastic game G and a pair of strategies x and a 
to be employed by the players), the value of vertex i, v T O (z), is defined to be the probability that the token 
reaches the 1-sink in a random walk on the graph of G t a , starting at vertex i. The value vector v x o of 
G TjC is defined to be the vector whose components are the vertex values of G T;0 . 
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2.4 Optimal Player Strategies and Optimal Vertex Values 



Given a simple stochastic game G, a strategy % opt for the min player satisfying 

v Vr,o o *(0 <v x ,a opt {i) Vi,T 
is called an optimal strategy for the min player. Similarly, a strategy o opt for the max player satisfying 

Vx opt ,o opt {i) >v Xoph a{i) Vi,G 

is called an optimal strategy for the max player. Informally, the formulas say that by employing an 
optimal strategy for G, a player assures himself the highest probability of winning G no matter what the 
start vertex. 

The optimal value of vertex i, v(i), is defined as 

v (i) =Vx opl ,o opt (i) 

where x opt and o opt are a pair of optimal player strategies for G. The optimal value of a vertex denotes 
the probability that the max player wins the game starting at that vertex, assuming both players employ 
optimal strategies. The optimal value vector v of G is defined to be the vector whose components are the 
optimal vertex values of G. Misleadingly, the value of a simple stochastic game is the optimal (and not 
just any) value of the start vertex. It is also important not to confuse vertex values with optimal vertex 
values, as their meaning is different. 

2.5 Stopping Simple Stochastic Games 

A stopping simple stochastic game is a simple stochastic game which does not permit infinite plays, i. e. 
the token always reaches a sink vertex after a finite number of rounds, regardless of the strategies chosen 
by the players. More precisely, if for all pairs of strategies x, o each vertex of the reduced game G % <5 has 
a path to a sink vertex, then G is stopping. Stopping stochastic games are also referred to as stochastic 
games that halt with probability 1 . 

3 Properties of Simple Stochastic Games 

From the introduction, we can already derive some important .game theoretic properties of simple stochas- 
tic games. We will use these in the elaborations to follow: 

/ determined - the optimal value vector exists and is unique. 

/ finite - the game has n states, and in each state the players have at most two actions to choose from. 

/ zero sum - in every state of the game, the win expectancy for one player is the complement of the 
win expectancy for the opponent. 
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/ perfect information - the players act sequentially and each player is completely informed about the 
history and the state of the game. 

/ reachability objective - the objective of the players is to force the token to reach their respective 



Let us start by discussing player strategies for simple stochastic games. The rationale of defining player 
strategies just as we did originates from the initial work on stochastic games by Shapley [14]; he showed 
that perfect information stopping stochastic games - and hence stopping simple stochastic games - have 
a Nash equilibrium [3] in purejj memory les^j optimal strategies (Somla [15] points out in a footnote 
that it is possible to extend this results to non-stopping simple stochastic games, using advanced proof 
techniques). Because of this fact, it suffices to denote a player's strategy by a prescription that, for each 
vertex belonging to the player, states to which child the token is to be moved; such a prescription can be 
modeled as a subset of the game's edges. 

Furthermore, as a direct consequence of the definition of optimal strategies, we find that a min strategy 
is optimal for a particular game G if and only if it is locally optimal (or greedy) at every min vertex of G 
with respect to the optimal value vector. That is to say, the best strategy for the min player is to always 
move the token from a min vertex to the child which has got the lower optimal value. The same statement 
can be made for the max player, but certainly, the max player always moves the token from a max vertex 
to the child which posesses the higher optimal value. We conclude that a player cannot improve his 
performance by making local concessions - unlike in chess, non-greediness will not be rewarded. 

Another property of optimal player strategies is that they are not necessarily unique; instead, a simple 
stochastic game may posses more than one optimal strategy for a player. As a trivial example, picture 
a simple stochastic game which contains a min vertex that has itself and the 0-sink as children. In this 
game, the optimal value of the min vertex is and the min player possesses at least two different optimal 
strategies for the game - one of which contains the edge to the 0-sink and one of which contains the loop 
edge. 

We have mentioned that player strategies are independent of the game's history and deduce that they can 
be fixed before the start of the game. Once both players have fixed their strategies to be % and a, a random 
walk of the token on the reduced game G ta decides upon the winner. For the upcoming discussion about 
the reduced game G TO let us w.l.o.g. assume that the vertices in G xo are labeled in such a way that the 
vertices 1, . . . ,t are those that have a path to a sink vertex. With this in mind, we can easily verify that 
- following its definition - the value vector v Ti0 of G t , a is a solution to the following system of linear 
equations: v z ^(n) = 1, v T;0 (z) = for t < i < n and otherwise 



sink vertex. 



2 j player - the coin-flipping ruler over average vertices (nature) is given the status of a half player. 




v T;0 (j) if i is a min or max vertex with child j 

j(vz,a(j) + Vz,a(k)) if i is an average vertex with children j and k 



which can be written as 



Vt,o = Qvx.c + b (/ - g)v T ,a = b 



(1) 



3 a strategy is called pure if it consists of deterministic (as opposed to random) choices by the player. 

4 a strategy is called memoryless if the players choice only depends on the state of the game, and not its history. 
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where Q 6 Q" x " is related to the topology of G T , as follows: Qjj = if i > t and otherwise 

{1 if i is a min or max vertex with child j 
2 if i is an average vertex with child j 

and b E Q n is defined as 




1 if i = n 
otherwise 



We will rewrite the proof given by Condon [4] that ([T]) has a unique solution. Let X/ be the z'-th eigenvalue 
of Q. The idea is to show that as m — > °°, Q mn — > 0, from which the following chain of deductions can be 
made 

X, ■ 1 i = 1 . . . n 
det(Q-I)^0 
rank(Q — I) = n 
(1) has a unique solution 



Let us denote the upper t rows of Q by the t x n matrix Q t , which is the 1-step transition matrix of non- 
sink vertices of G TO that have a path to a sink vertex. As a matter of fact, entry ij of Q" 1 " = Qf-Qt 
denotes the probability that the token reaches vertex j from vertex i in a random walk on the graph G x o 
in exactly mn steps. Therefore, the sum of values in the z-th row of Q" m equals one minus the probability 
of reaching a sink vertex from i in k < mn steps. The probability of reaching a sink vertex from i in k < n 
steps is greater than zero, as i has at least one path to a sink vertex of length no more than the maximal 
diameter of G x o - which is n — 1 . Additionally, for m' > m, the probability of reaching a sink vertex from 
i in k < m'n steps is obviously greater than the probability of reaching a sink vertex from i in k < mn 
steps. As the values of Q™ n are all positive, it follows that as m — > °°, Q' t nn — > and thus Q mn — > 0. 

Using a local graph search algorithm, one is able to verify in time 0(n) whether a given vertex of G x o 
has a path to a sink vertex. Therefore, Q and b can be constructed from G T;0 in time 0(n 2 ). By solving 
([T]) with Gauss -elimination or LC/-decomposition (both 0(n 3 )), we obtain a cubic time algorithm for the 
problem of computing the value vector of G x . . 

In a related concern, Condon [4] showed that the vertex values v T)0 (z) of the reduced game G T ,c are 
rational numbers from the set 

n t = {p/qeQ:0<p<q<4 t } (2) 

where t is defined as before, i. e. t is the number of non-sink vertices of G x , which have a path to a sink 
vertex. To understand why this is true, consider that, as v % ,a is a solution to ([T]), the components of v T ,o 
can be denoted by v Xj o(0 = Di/D where D is the determinant of the matrix / — Q and Z) ; is the determinant 
of I — Q which has the i-th column replaced by b. Since the components of / — Q and b are all rational, 
both Di and D must also be rational. Condon concludes the proof by showing that < Di < D < 4' holds. 
Note that not all values from the set Q. t can occur as vertex values in the game G t a ; instead, £l t is a 
superset - or approximation - of the possible vertex values of G Z Q . In the context of finding a reduced 
game's optimal vertex values, Q. t can be regarded as a search space of that problem. 



6 



1/16 



1/4 



1/3 



7/16 1/2 



1/2 9/16 



2/3 



3/4 



15/16 



Figure 2: Visualization of the set Q.2. The possible vertex values of a simple stochastic game with 4 
vertices are a subset of the depicted values. 



Concluding the reduced game topic, it is worth mentioning that reduced games can also be studied in the 
framework of Markov processes. If we were to assign transition probabilities of 1 to the single edges 
leaving player vertices in G CTX , next to the already established transition probabilities of 1/2 assigned 
to the edges leaving average vertices, then the so modified graph G ax would formally conform to the 
definition of a Markov chain. In a similar fashion, the reduced games G x and G c can be transformed into 
Markov decision processes (MDP's). 

Following its definition, the optimal value vector v of a simple stochastic game G is a solution to the 
equation system 



v(i) 



max(v(/),v(£)) 

min(v(./),v(*)) 

i(v{j)+v(k)) 



1 



if i is a max vertex with children j and k 
if i is a min vertex with children j and k 
if i is an average vertex with children j and k 
if i = n — 1 
if i = n 



which can be written as 

v = I G {v) (3) 

for Iq '■ [0, 1]" — > [0, l] n as defined above. Contrary to ([TJ), the equations in ([3]) are non-linear and a solution 
can no longer be derived analytically but rather has to be computed numerically. Shapley [14] showed 
that in the case of stopping stochastic games - and hence stopping simple stochastic games - the operator 
Iq is contracting on the hypercube [0, 1]" and therefore has a unique fixed point. In this case, the solution 
to (|3]) is the optimal value vector of the game. In the case of non-stopping simple stochastic games 
however, ([3]) is necessary, but not sufficient, for the optimal value vector of the game. For an example of 
ambiguous values in a simple stochastic game that has only one connected component and is non-trivial 
observe figure [T] In the depicted game, any value below 1/4 can be assigned uniformly to the vertices 
{2,4,6} without violating Q - though only is the correct optimal value for each of the vertices. We 
finish this paragraph with the statement that, contrary to optimal strategies, the optimal value vector is 
unique in every simple stochastic game. 

The last discovery about simple stochastic games which is relevant in the context of this paper is again 
due to Condon [4]. She found out that from a simple stochastic game G, a stopping simple stochastic 
game G' can be constructed whose vertex values are arbitrarily close to the vertex values of G. The 
construction rule is as follows: For [3=1 /2 cn , the so-called [3-stopping game G' adopts all the vertices 
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of G but it does not adopt any edge of G. For each edge (z, j) of G, the graph G' instead contains a path 
of m = cn average vertices which are connected to the vertices i, j and n — 1 (the 0-sink) as depicted in 
figure [3] 




*■ n-1 



Figure 3: Construction rule for the ^-stopping game G' of G, where (3 = l/2 m and m = cn. The dashed 
edge is present in G, but not in G'\ it is replaced by the depicted elements. 

Two important properties of the ^-stopping game G' have been set forth by Condon: 

1. G' can be constructed from G in time 0(n 2 ), where the constant c only has a linear effect on the 
runtime of the construction algorithm and is hidden in the O-notation. 

2. For arbitrary player strategies x and a, the corresponding value vectors v Ti0 of G and v[ a of G' 
satisfy 

|vx,a(0-v;,a(0l<2" (3 - c) ieV (4) 

We observe that by choosing c large enough, the differences between corresponding vertex values of the 
games G and G' can be made arbitrarily small, though G' can still be constructed in time polynomial in 
the size of G. We will use this result, as well as previous results from this section, for proving the claims 
we make in the next section. 



4 Complexity Survey 

We begin this section by showing that - given a simple stochastic game G - the below function problems 
have polynomial- time equivalent complexities: 

1 . What is the value of G? 

2. What is the optimal value vector of G? 

3. What are optimal player strategies of G? 

Proof. "2 < p 1": Let us assume that Algorithm Al computes the value of the game G. From Al, we 
construct an Algorithm A2 which computes the optimal value vector of G. On input of G, A2 iterates 
over all vertices i of G. In each iteration, A2 changes the start vertex of G to be i and then performs a 
run of Al on G, yielding the optimal vertex value v(i). After the last iteration, A2 outputs the computed 
values in form of the optimal value vector of G. The runtime of A2 is dominated by the queries to A 1 and 
since A2 makes n queries to Al, A2 is efficient given A 1 is. 
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"3 < p 2": Suppose Algorithm A2 computes the optimal value vector of the game G. We give an informal 
description of an Algorithm A3 which, using A2, computes the optimal player strategies of G. Our tool 
will be the result from the previous section that every min (max) strategy, which is locally optimal at every 
min (max) vertex of G, is an optimal min (max) strategy of G; it therefore suffices for A3 to compute 
two locally optimal strategies % opt and o opt for the players. On input of G, A3 runs A2 on G, yielding the 
optimal value vector v of G. For each min vertex i with children j and k, A3 adds (/, j) to the initial empty 
t if v(j) < v(k), else A3 adds (i, k) to x. Similarly, for each max vertex i with children j and k, A3 adds 
to the initial empty o if v(j) > v(k), else A3 adds (i,k) to o. Following this construction, x opt and 
o opt are locally optimal strategies and since A3 makes one query to A2 and performs 0{n) instructions, 
A3 is efficient given A2 is. 

"1 <p 3": From an algorithm A3 which computes a pair of optimal player strategies of the game G, we 
construct an algorithm A 1 which computes the value of G. On input of G, Al runs A3 on G to obtain the 
optimal player strategies x opt and a opt of G. In time 0(n), Al then constructs the reduced game Gx opt .a opt 
corresponding to x opt and o opt . We already argued in the previous section that the value vector (and 
hence the value) of a reduced game can be computed in time polynomial in the size of the game. Since 
Al makes one call to A3, Al is efficient given A3 is. □ 

SSG-TWOKIND (function) 



Input: Simple stochastic game G, lacking one vertex kind 
Question: What is the optimal value vector v of G? 

Complexity class: FP 
Algorithms: 

Papers discussing algorithms for SSG-TWOKIND include [1,5-7,15,16]. In the case that G lacks min ver- 
tices, SSG-TWOKIND can be expressed as the following linear optimization (or programming) problem, 
as was first shown by Derman [6] 

n 

£v(i)->min 

(=1 

subject to v(n — 1) =0, v(n) = 1 and 

1 < i < n 

if i is a max vertex with child j 
if i is an average vertex with children j and k 

Similarly, in the case that G lacks max vertices, Derman showed that the optimal value vector of G is the 
unique solution to the linear optimization problem 

n 

^ v(i) — > max 
(=l 



v(i) > 





v(j) 



£(v(y)+v(*)) 
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subject to v{n — 1) =0, v(n) = 1 and 

1 < i < n 

if i is a min vertex with child j 
if i is an average vertex with children j and k 

Khachian [10] was the first to show that linear programming problems can be solved in time polynomial 
in the bits needed to describe the problem. Since the amount of bits needed to encode any of the above 
formulas is a polynomial function of n [4], we obtain the fruit that the value vector v of G can be computed 
in polynomial time. 

So called interior point algorithms for linear programming problems, which can move through the fea- 
sible regiorj^] instead than just along its boundary, perform best in practice. According to wikipedia, 
Mehrotra's [13] interior point algorithm is regarded as the fastest, though a worst case boundary is not 
available. Up to date, there exists no strongly polynomial time algorithm for linear programming prob- 
lems, i.e one that is polynomial in the number of variables of the problem up to order ~4. 

In the case that G lacks average vertices, the algorithm given in appendix A correctly computes the 
optimal value vector of G. The number of executions of the repeat loop is 0(n), as D is static after at 
most n — 2 executions. It follows that the algorithm has quadratic runtime, which is a much better result 
as that obtained for the above linear programming problems. 

SSG-OW (function) 



Input: Simple stochastic game G 

Question: What is the optimal value vector v of G? 



Complexity class: FNP 

We will first give an informal description of the proof that SSG-OWE FNP. Let Q. n D Q. t be a superset of 
the possible vertex values of G, as discussed in ([2]). The proof is based on the fact that there exists exactly 
one vector z from the set Q% - namely the optimal value vector of G - which satisfies \z(i) — v'(i)\ < 
4~ 2n /2 for all i e V, where v' is the optimal value vector of the 1 /2 9n - stopping game G' of G. We deduce 
that a nondeterministic Turing-machine M for problem SSG-OW could guess an arbitrary vector z E Q" 
and - assuming M knows v' - argue in polynomial time that z = v if and only if z satisfies the mentioned 
constraint. Of course, M cannot compute v' from scratch, but since M is nondeterministic, we can happily 
let it, next to v, also guess v' . By evaluating ([3]), M can easily verify whether its guess of v 1 is correct. 
Therefore, letting M guess the optimal value vectors of both G and G ', M is able to conduct a polynomial 
time verification of both guesses. 

Considering a formal proof, we first show that the difference between two vertex values of G is either 

5 The feasible region of a linear programming problem is the set of variable evaluations (vertices) which satisfy the con- 
straints given in the problem. 





v(0 < I v(j) 

kvU)+y(k)) 
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or has a lower bound 8. Let a, (3 E Cl n . Then either [ oc — (3| = or 
|a-p| 



p__t_ 




pq'-p'q 


q q' 




qq' 



1 1 

> — > 



i — 2n 



qq' ~ 4«4« 



Following Q, we further observe that the differences between corresponding vertex values in the games 
G and the associated 1 /2 9n -stopping game G' are bound below 8/2, i. e. for arbitrary strategies % and a 



Ko(0 -v / t,o(z')| < 2 



— 6n a — 3/7 



<8/2 



ieV 



On input of G, a nondeterministic Turing-machine M for problem SSG-OW first guesses one vector from 
the set Q£ to be the optimal value vector of G and one vector from the set Q.", to be the optimal value 
vector of G', where n' = 9n\E\ +n. Let us denote these vectors by the tuple (z,s). If we define that M 
accepts (z,s) if z = Ig(z), s = Iq{s) and \z(i) —s(i) \ < 8/2 for all i E V, then M accepts (z,?) if and only 
if z = v A s = V . Also it should be clear that M comes to a conclusion in time polynomial in the number 
of vertices of G, as in particular M is able to construct G' from G in time 0(n 2 ). 



Proof. Instead of saying "M accepts (?,?)", we will just say "M accepts". "=^": If M accepts, s = Ic(s) 
and hence s = V . It remains to prove that if M accepts, z = v holds. Suppose M accepts but z^v. If 
z^v, then \z(i) — v(i) \ > 8 for at least one i e V. Since M accepts, — s(i) \ = \z(i) — v(i)'\ < 8/2 for 
all i E V. It follows that \v(i) — v(i)'\ > 8/2 for at least one i E V which contradicts the construction of 
G'. "<^=": Suppose z = v and s = v'. Then obviously z = Ig(z), ?= ^g'(^) an d \z(i) —s(i)\ < 8/2 for all 
i E V by construction of G'. Hence M accepts. □ 

Algorithms: 

All algorithms [2,5,7, 11,12, 15, 16] suggested to date for the problem of finding a solution to ([3]) have ex- 
ponential time complexities. The most intuitive of those operates on the basis of the iterative update rule 
v; + i = Ig(vi) and is called successive approximation or value iteration algorithm. As already mentioned, 
this algorithm is guaranteed to converge to the correct solution only if G is stopping. In her paper about 
algorithms for simple stochastic games, Condon [5] presents a "worst case" example for the successive 
approximation algorithm in form of a special game graph, where the algorithm takes an exponential 
number of updates until it finds the optimal value vector. 

So called strategy improvement algorithms try to iteratively improve an initial pair of strategies until 
convergence. A particularly simple algorithm of this class is the one of Hoffman & Karp [11], for which 
a worst case running time of 0(2" /n) is established. Bjorklund & Vorobyov's [2] randomized algorithm 

dating from 2005 has a worst case running time of 0(2^" which as of the authors knowledge, is the 
best result obtained to date. 



SSG-VALUE* (decision) 



Input: Simple stochastic game G, a E Cl, 
Question: Is the value of G > a? 
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Complexity class: NP D coNP 

SSG-VALUE* is a straightforward extension of the SSG-VALUE Problem, which is defined by a = 1 /2. 
Using the same terminology as for the previous problem, a nondeterministic Turing-machine M for SSG- 
VALUE* first guesses one vector from the set Q?, to be the optimal value vector of the 1/2 9 "- stopping 
game G' of G, where n' = 9n\E\ +n. If we denote this vector by ?and define thatM accepts ?if s = Ig>(s) 
and s(start) > a, thenM accepts ?if and only if value > a As = v' . It should also be clear thatM comes 
to a conclusion in time polynomial in the size of G. 

Proof. "=>•": If M accepts s, then s = Ig'(s) an d therefore s = v' . It remains to show that if M accepts 
s, value > a. Suppose M accepts sbut value < a. If value < a then value' = s(start) < a since the 
stopping game G' always has lower value by construction. Hence M cannot accept s. "-<=": Suppose 
value > a and s = v' . Then value > a + 8 and since, by construction of G' , \value — s(start) | < 8/2, it 
follows that s(start) > a. Since also s = Ig'(s), M accepts s. Applying a small modification to M by 
letting it accept if s(start) < a, M can obviously also decide the complement of SSG-VALUE* . □ 

Algorithms: 

The author is not aware of any algorithm specially tailored for SSG-VALUE*, instead, algorithms for 
the more general SSG-OVV are used to solve SSG-VALUE*. Though theoretically, algorithms for SSG- 
VALUE* could exploit the circumstance that - depending on the topology of G - not all optimal vertex 
values of G would need to be computed in order to solve the problem, as we are mainly concerned with 
the worst case behavior of such algorithms, the case where the value of G depends on its whole game 
graph must be assumed. Therefore, the value vector of G must be computed after all. 

5 Conclusion and Open Problems 

We have seen that the most interesting and also most difficult simple stochastic game problem, that of 
computing the optimal value vector, is hard to solve. However, restricting the input to simple stochastic 
games that lack one vertex kind, we observed that the same problem becomes tractable and can be solved 
rather efficiently. Another result we obtained was that the value vector of reduced games can be computed 
in polynomial time, i.e that given a simple stochastic game and a pair of player strategies, the question 
about the winning-probabilities of the players is efficient answerable. 

Sadly, we were not able to show polynomial time equivalence of SSG-OVV and SSG-VALUE* but we 
would still like to know what the decision equivalent of SSG-OVV is. Another major open problem is to 
show completeness of SSG-VALUE (or SSG-OW) for a specific complexity class, thereby specifying the 
problem's exact complexity. In a last word, Condon expressed the possibility that finding an algorithm 
that seperates simple stochastic games with low value (a < 0, 25) from those with high value (a > 0, 75) 
might be of significance in solving the master problem: SSG-VALUE e PI 
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Appendix A 



Algorithm 1 optimalValueVector(G) 
input simple stochastic game G = (V,E) 
output optimal value vector v of G 
require V avg = 
begin 

D = {n-l,n}, v = 
repeat 

for ieV\D do 

if i is a max vertex with a 1 -valued child in D then 

D = DU{i}, v(i) = 1 
else if i is a max vertex with two 0- valued children in D then 

D = DU{(}, v(i) = 
else if i is a min vertex with a 0-valued child in D then 

D = DU{i}, v(i) = 
else if i is a min vertex with two 1 -valued children in D then 

D = DU{i}, v(i) = 1 
end if 
end for 
until D is static 
return v 
end 
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